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(7,"<B>,""</B>;'"<I>,""</I>,""<TABLE>"} 



procedure pcwrapHLRT (page P) 
scan to the line 7 in P 

while the next occurrence of // in P occurs before the next occurrence of t 
scan in P to next occurrence of //; save position as start of item attribute 
scan in P to next occurrence of n, save position as end of item attribute 
scan in P to next occurrence of /p; save position as start of price attribute 
scan in P to next occurrence of Tp; save position as end of price attribute 
return extracted (item, price),. . . . .} pairs 
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Element Name 


Element Description 


Vendor Name 


Name of the online shop 


venuor ukl 


1 116 UrAL Ui 1116 Ullliric of lUp 


Form URL 


The URL to submit search data 


Learning Domain 


The domain used to learn the vendor description 


Head 


The end of header position of vendor's product pages. 
1 nis element to learnt to reauce tne proouct page 
searching space and thus shorten the product 
inior rnaiiuri CAirciuuuri uiiic 


Toil 
1 all 


The start of footer position of vendor's product pages. 
Samp Head elemsnt with this element the nroduct 
information extraction time for Shopper Agent is 
shorten. 


Left Delimiter of Item 


The Shopper Agent uses this two delimiters to identify 
a product 


Right Delimiter of Item 
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The Shopper Agent uses this two delimiters to locate 
a product's price information 


Right Delimiter of Price 
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EXAMPLE 


Element 
Name 


Element Description 


Vendor Name 


800.com 


Vendor URL 


http://www.800.com 


Form URL 


http://www.800.com/search/srchrslts.asp?qs=1&slteentry=AII&entry= 


Learning 
Domain 


Md 


Head 


1230 


Tail 


</BODY> 


Left Delimiter 
of Item 


/td width="12"> 


Right 

Delimiter of 
Item 


</fon 


Left Delimiter 
of Price 


Your Price: $ 


Right 

Delimiter of 
Price 


</b> 
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MD PRICE 


<HTML> 




<TITLE><B>A Simple Product Catalogs<B><n"ITLE> 


Model 
Number 


PRICE 
(US$) 


<BODY> 

<n2> MD rrice </nZ> 
<TABLE B0RDER=1> 

<TR BGCOLOR=ORANGE><TH> Model Number </TH> 
<TH> PRICE(US$)</TH></TR> 

<TR><TD><B> HM381 MD</B></TD><TD><I>399.95</I></TD></TR> 


HM381MD 


399.95 


MD2070 


599.95 


MD203 


249.95 


MDR3 


399.95 


<TR><TD><B> MD2070</Bx/TD><TD><I>599.95</I></TD></TR> 




<TR><TD><B> MD203</B></TD><TD><l>249.95</l></TD><rrR> 






<TR><TD><B> MDR3</B></TD><TD><I>399.95</I></TD></TR> 






</ 1 ADLt> 






<P> 






<HR WIDTH=200 ALIGN=LEFT> 






<P> 






<B> End Of The Product Catalog </B> 
</BODY> 






</HTML> 



FIGURE 7 



Product Entry 


Labels 


{HM381MD. 399.95} 


{«174. 180>, <197, 202»} 


{ MD2070, 599.95} 


{«229, 234>, <251, 256»} 


{ MD203, 249.95} 


{«283, 287>, <304, 309»} 


{ MDR3. 399.95} 


{«336. 339>, <356, 361 »} 



FIGURE 8 
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* * * 


* * * 



FIGURE 9 
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FIGURE 15A 
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FIGURE 24 



procedure execHLRi (wrapper (h, h, n, Ip, rp, t), page P) 
m «-0 

scan in P to line h 

while the next occurrence of /, in P occurs before the next occurrence of t 
m m + 1 

scan in P to the next occurrence of /, in P; save position as bmj 
scan in P the next occurrence of n ; save position as Cmj 
scan in P to the next occurrence of /p in P; save position as bm,p 
scan in P to next occurrence of rp, save position as em,p 
return label {. . ., «bn,i, enii>.<bmp, emp>>} 

FIGURE 25 



Procedure learnHLRT (examples e) 

1. Generate the candidate sets CandSr(l, c). Candsi(l, € ), CandSr(p, c), Candsi(p, 
e). 

2. Enumerate the cross product of these candidate sets; each element W = (h. ij, n, 
Ip, rp, t), of this cross product is a wrapper. Halt if W is satisfactory i.e., execLR(W, 
Pn) = Ln, for every (Pn, Ln) e e. 



FIGURE 26 
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/* 

* The main problem 
*/ 

procedure leamHLRT (examples e) for each 1 <k <K 
for each u e candsi(p, e): if validi(u, p, e) then Ip^ u and terminate this loop 
for each u e candSr(i, e): if validr(u, i, e) then rj<-and temninate this loop 
for each u e candSr(p, e): if validr(u, p, e) then rp<-u and terminate this loop 
for each Un € candS|(i, e) 
for each Uh e candSh(B) 
for each Ut e candSt(£) 
ifvalid^h.t( u,i, Uh, Ut, e) then 

li<-Uii, h<-Uh, t<-Ut. and terminate these three loops 
return HLRT wrapper (h, Ij, n, Ip, rp. t) 

/* 

*Generate a set of candidates for left delimiter of price attribute 
*/ 

procedure candsi(attribute price, examples e) 
return the set of al! suffixes of the shortest string in neighborS|(price, e) 

/* 

*Generate a set of candidates for right delimiter of attribute a. Here a could be item or price 
*/ 

procedure candSr(attribute a, examples s) 

return the set of all prefixes of the shortest string in neighborSr(a, e) 

/* 

* Generate a set of candidates for a page's head 
*/ 

procedure candSh (examples e) 

return the set of all substrings of the shortest string in heads(e) 

/* 

*Generate a set of candidates for a page*s tail 
*/ 

procedure candst(example s) 

return the set of all substrings of the shortest string in tails(£) 

/* 

*Detemnine whether a particular candidate for the left delimiter of price is valid 

*This procedure applies constraints C3 

*/ 

procedure validj(candidate u, attribute price, examples e) 
for each se neighbor5i(price, e): if u is not a proper suffix of s then return FALSE 

Figure 27 A 
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for each s e tails(8): If u is a sub string of s then return FALSE 
return TRUE 

/* 

*Detemiine whether a particular candidate for the right delimiter of item or price is 
Valid 

*This procedure applies constraints C1 and C2 
*/ 

procedure validr(candidale u, attribute a, examples e) 
for each s g attribs(a, e): if u is a sub string of s then return FALSE 
for each s e neighbors(a. e): if u is not a prefix of s then return FALSE 

return TRUE 

/* 

*Determine whether a particular combination of candidate uh, ut, and uli for h, t, and 

*li respectively are satisfactory. 

*/ 

procedure validii,h,t(candidates Un, Uh, Ut, example s) 
for each s g heads(e) 

if is not a sub string of s then return FALSE 
if Uii is not a proper suffix of scan (s, Uh) then return FALSE 
if Ut occurs before U1i in scan (s, Uh), then return FALSE 
for each s e tails (e) 
if Ut 1S not a sub string of s then return FALSE 
if Uli occurs before Ut in s then return FALSE 
for each s e seps(e) 

if Uli is not a proper suffix of s then return FALSE 
if Ut occurs before Uli in s then return FALSE 
return TRUE 

/* 

*Return a set of containing all values of a attribute, either item(i) or price(p) in each 
^example 
V. 

procedure attribs(attribute a, examples e) 

return U(Pn.Ln) e 8{Pn[bni.a. eni.a] K. . M (bm.k. em.k), . . . > G Ln} 

/* 

*Return all strings to the left of an attribute, whether these strings are in the heads 
* or the bodies of the pages 
*/ 

procedure neighborsi(attribute a, examples e) 

if a=i then return seps (i, e) U heads (e) else return seps (a, e) 

r 

FIGURE 27B 
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START 



FIGURE 30 





Call SQLAccEnv 
to allocate environment workspace 
(provides HENV handle) 


1 




Call SQLAIIocConnect 
to allocate connection workspace 
(provides HDBC handle) 






Call SQLConnect 
to connect to the database 


1 




Call SQLAIIocStmt 
to allocate a statement workspace 
(provides HSTMT handle) 


1 ► 






Call ODBC member functions 
(SQLExecDirect, SQLFetch, SQLGoData) 
to perform a database operation 






Call SQLFreeStmt 
to deallocate statement workspace 
(invalidates HSTMT handle) 






Call SQLDisconnect 
to disconnect to the database 






Call SQLFreeConnect 
to deallocate connection workspace 
(invalidates HDBC handle) 






Call SQLFreeEnv 
to deallocate environment workspace 
(invalidates HENV handle) 



STOP 
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Call member function 
s.GetHttpConnection 
(creates CHttpConnection object c) 



FIGURE 31 



1002 



1004 



Call member function 
c.OpenRequest 
(creates CHttpFile objectf) 



1006 



Call member function 
f.SendRequest 
(sends the POST request and formdata) 



1008 



Call member function 
f.Read 

(returns chunk of response) 




1012 



1010 



Process data 



101 
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